Search CORE

80 research outputs found

Does Putting a Linguist in the Loop Improve NLU Data Collection?

Author: Alicia Parrish
Samuel R. Bowman
Publication venue
Publication date: 01/01/2021
Field of study

New York University Faculty Digital Archive

"Is a picture of a bird a bird": Policy recommendations for dealing with ambiguity in machine vision models

Author: Aroyo Lora
Laszlo Sarah
Parrish Alicia
Publication venue
Publication date: 27/06/2023
Field of study

Many questions that we ask about the world do not have a single clear answer, yet typical human annotation set-ups in machine learning assume there must be a single ground truth label for all examples in every task. The divergence between reality and practice is stark, especially in cases with inherent ambiguity and where the range of different subjective judgments is wide. Here, we examine the implications of subjective human judgments in the behavioral task of labeling images used to train machine vision models. We identify three primary sources of ambiguity arising from (i) depictions of labels in the images, (ii) raters' backgrounds, and (iii) the task definition. On the basis of the empirical results, we suggest best practices for handling label ambiguity in machine learning datasets

arXiv.org e-Print Archive

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

Author: Bowman Samuel R.
Chen Angelica
Cho Kyunghyun
Padmakumar Vishakh
Parrish Alicia
Phang Jason
Zhao Chen
Publication venue
Publication date: 17/07/2023
Field of study

Large language models (LLMs) have achieved widespread success on a variety of in-context few-shot tasks, but this success is typically evaluated via correctness rather than consistency. We argue that self-consistency is an important criteria for valid multi-step reasoning in tasks where the solution is composed of the answers to multiple sub-steps. We propose two types of self-consistency that are particularly important for multi-step reasoning -- hypothetical consistency (a model's ability to predict what its output would be in a hypothetical other context) and compositional consistency (consistency of a model's final outputs when intermediate sub-steps are replaced with the model's outputs for those steps). We demonstrate that multiple variants of the GPT-3/-4 models exhibit poor consistency rates across both types of consistency on a variety of tasks.Comment: Added GPT-4 result

arXiv.org e-Print Archive

BLiMP: The Benchmark of Linguistic Minimal Pairs for English

Author: Bowman Samuel R.
Liu Haokun
Mohananey Anhad
Parrish Alicia
Peng Wei
Wang Sheng-Fu
Warstadt Alex
Publication venue
Publication date: 23/09/2020
Field of study

We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.Comment: To appear in TAC

arXiv.org e-Print Archive

Recommended from our members

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Author: Bowman Samuel R.
Liu Haokun
Mohananey Anhad
Parrish Alicia
Peng Wei
Wang Sheng-Fu
Warstadt Alex
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

We introduce BLiMP (The Benchmark of Linguistic Minimal Pairs), a human-solvable challenge set for evaluating language models (LMs) that covers a broad range of major grammatical phenomena in English. BLiMP consists of over 30 datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. Like GLUE (Wang et al., 2018), BLiMP makes it easy to directly compare models. Evaluating n-gram, LSTM, and Transformer LMs (GPT-2 and TransformerXL), we find that transformers are strongest overall, achieving (near) human performance on agreement and binding. However, phenomena like wh-islands and NPI licensing remain challenging even for state-of-the-art LMs

ScholarWorks@UMass Amherst

Evolutionary Reconstructions of the Transferrin Receptor of Caniforms Supports Canine Parvovirus Being a Re-emerged and Not a Novel Pathogen in Dogs

Author: A Kapoor
A Katzourakis
A Steinel
AB Allison
AL Hughes
Alicia N. Ortega
AM Giannetti
Andrew B. Allison
Ann Demogines
AP West Jr
Carole E. Harbison
CM Lawrence
Colin R. Parrish
CR Parrish
CR Parrish
CR Parrish
CR Parrish
DE Yeung
E Eizirik
E Wang
EA Rutledge
EA Rutledge
EC Holmes
G Wlasiuk
I Wrobel
IK Barker
J Abraham
J Zhang
JA Lebron
Jason T. Kaelber
JM Fernandez-Real
JS Parker
K Hoelzer
K Hueffer
K Lindblad-Toh
Laura B. Goodman
LB Goodman
LD Hurst
LM Palermo
LM Palermo
Luis Villarreal
M Anisimova
M Emerman
NR Meyerson
OR Bininda-Emonds
R Li
S Hafenstein
Sara L. Sawyer
SR Ross
TE McGraw
TL Webb
VA Belyi
Y Cheng
Y Cheng
Z Yang
Z Yang
ZD Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Parvoviruses exploit transferrin receptor type-1 (TfR) for cellular entry in carnivores, and specific interactions are key to control of host range. We show that several key mutations acquired by TfR during the evolution of Caniforms (dogs and related species) modified the interactions with parvovirus capsids by reducing the level of binding. These data, along with signatures of positive selection in the TFRC gene, are consistent with an evolutionary arms race between the TfR of the Caniform clade and parvoviruses. As well as the modifications of amino acid sequence which modify binding, we found that a glycosylation site mutation in the TfR of dogs which provided resistance to the carnivore parvoviruses which were in circulation prior to about 1975 predates the speciation of coyotes and dogs. Because the closely-related black-backed jackal has a TfR similar to their common ancestor and lacks the glycosylation site, reconstructing this mutation into the jackal TfR shows the potency of that site in blocking binding and infection and explains the resistance of dogs until recent times. This alters our understanding of this well-known example of viral emergence by indicating that canine parvovirus emergence likely resulted from the re-adaptation of a parvovirus to the resistant receptor of a former host

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Texas ScholarWorks

FigShare

DataPerf: Benchmarks for Data-Centric AI Development

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.Comment: NeurIPS 2023 Datasets and Benchmarks Trac

arXiv.org e-Print Archive

Theory and description in African Linguistics: Selected papers from the 47th Annual Conference on African Linguistics

Author: Abubakari Hasiyatu
Akumbu Pius W.
Anttila Arto
Apel Viktoria
Appah Clement I. K.
Baerman Matthew
Baier Nico
Bennett Wm. G.
Berkson Kelly
Bickmore Lee
Bodomo Adams
Carter William
Clem Emily
Diercks Michael
Downing Laura
Duah Reginald Akuoko
Duncan Philip T.
Feldscher Cara
Fryer Jake
Gjersøe Siri
Gould Isaac
Green Christopher R.
Grenoble Lenore A.
Grimm Nadine
Gunnink Hilde
Hamlaoui Fatima
Jenks Peter
Jones Evan
Jones Patrick
Kambon Obadele
Khachaturyan Maria
Kimper Wendell
Kopper Sarah
Kusmer Leland Paul
Lamont Andrew
Landman Meredith
Lotven Samson
Major Travis
Makasso Emmanuel-Moselly
Moeng Emily
Monich Irina
Nformi Jude
Nurse Derek
Ollennu Yvonne Akwele Amankwaa
Parrish Alicia
Paschen Ludger
Pesetsky Jonathan
Petrollino Sara
Pillion Betsy
Purvis Tristan
Ranero Rodrigo
Rao Meghana
Sande Hannah
Sands Bonny
Scott Tessa
Selvanathan Naga
Udoinyang Mfon
Um Emmanuel Ngué
Yu Kristine
Zogbo Lynell Marchese
Publication venue: Language Science Press
Publication date: 07/02/2018
Field of study

The papers in this volume were presented at the 47th Annual Conference on African Linguistics at UC Berkeley in 2016. The papers offer new descriptions of African languages and propose novel theoretical analyses of them. The contributions span topics in phonetics, phonology, syntax, semantics, and pragmatics and reflect the typological and genetic diversity of languages in Africa. Four papers in the volume examine Areal Features and Linguistic Reconstruction in Africa, and were presented at a special workshop on this topic held alongside the general session of ACAL

Language Science Press

Theory and description in African Linguistics: Selected papers from the 47th Annual Conference on African Linguistics

Author: Abubakari Hasiyatu
Akumbu Pius W.
Anttila Arto
Apel Viktoria
Appah Clement I. K.
Baerman Matthew
Baier Nico
Bennett Wm. G.
Berkson Kelly
Bickmore Lee
Bodomo Adams
Carter William
Clem Emily
Diercks Michael
Downing Laura
Duah Reginald Akuoko
Duncan Philip T.
Feldscher Cara
Fryer Jake
Gjersøe Siri
Gould Isaac
Green Christopher R.
Grenoble Lenore A.
Grimm Nadine
Gunnink Hilde
Hamlaoui Fatima
Jenks Peter
Jones Evan
Jones Patrick
Kambon Obadele
Khachaturyan Maria
Kimper Wendell
Kopper Sarah
Kusmer Leland Paul
Lamont Andrew
Landman Meredith
Lotven Samson
Major Travis
Makasso Emmanuel-Moselly
Moeng Emily
Monich Irina
Nformi Jude
Nurse Derek
Ollennu Yvonne Akwele Amankwaa
Parrish Alicia
Paschen Ludger
Pesetsky Jonathan
Petrollino Sara
Pillion Betsy
Purvis Tristan
Ranero Rodrigo
Rao Meghana
Sande Hannah
Sands Bonny
Scott Tessa
Selvanathan Naga
Udoinyang Mfon
Um Emmanuel Ngué
Yu Kristine
Zogbo Lynell Marchese
Publication venue: Language Science Press
Publication date: 07/02/2018
Field of study

Language Science Press